Margins and -ml-

The Probit model

Introduction

Previously, I have shown how to use -margins- after -ml-. I motivated the excercise, as an example for the estimation of nonlinear models.

The model, however, was not itself non-linear. It was, in fact, a linear regression model estimated using mle. This was done by impossing the assumption of normality on the errors of the model.

On the bright side, the outcome of interest: the standard deviation of the error and the outcome in actual scale (not its log) was nonlinear.

This time, I'll be making a similar excercise, using a more common nonlinear model: the probit model.

As in the previous example, I'll walk you through the setting up the ml program and the prediction program.

The Setup

First thing first. Unless you already have this program saved somewhere in your accesible ado files (most likely the "ado/personal" folder), make sure to have the following program in memory.

. program adde, eclass
  1.         ereturn `0'
  2. end

So, lets start.

The probit model is a nonlinear model that can be used when your dependent variable is a binary variable (0 - 1), and when your intention is to model what is the "probability of success (y=1)" given a set of characteristics. \( P(y=1|X) \)

From the technical point of view, you can think about the probit model in two ways.

$$ E(y=1|X) = P(y=1|X) = \Phi (X\beta) $$

$$ y^* = X\beta + e_i $$ $$ y^*>0 \rightarrow y=1$$ $$ y^*<=0 \rightarrow y=0$$ $$ y^* = X\beta + e_i $$

There is of course the added constraint that, for this to be a probit model you assume \( e_i \sim N(0,1) \) follows a standard normal distribution. Under this assumption, determining the probability of success \( P(y=1|X) \) is equivalent to determining the probability that \( P(y^*>0|x) \).

Using either approach, the likelihood of a single observation will be equal to (this will be important for the MLE): $$ L_i = \Phi(X\beta) \quad if y = 1 (success) $$ $$ L_i = 1-\Phi(X\beta) \quad if y = 0 (failure) $$ One must remember that we will be using the LOG of this expression for the MLE program. And of course, we also have the outcome of interest: $$ P(y=1|X) = \Phi(X\beta) $$ where, again, the \( \Phi \) is the cumulative distribution function (CDF) for a standard normal.

Probit MLE

First things first, we need to write the LogLikelihood function for the probit model.

. ***/
. program myprobit
  1.         args lnf xb 
  2.         local p1 normal(`xb')
  3.         local p0 normal(-`xb')
  4.         local y $ML_y1
  5.         qui:replace `lnf' = log(`p1') if `y'==1
  6.         qui:replace `lnf' = log(`p0') if `y'==0
  7. end

Notice that this program has only 2 arguments. The variable that will store the LogLikelihood "lnf", and the variable that will capture the observed component of the latent index. "xb".

To make this text easier to read, I use some locals to identify different components for my LL.

Now just need to write the "predict" program:

. program myprobit_p
  1.         syntax newvarname [if] [in] , [ pr odds *]
  2.         if "`pr'`odds'" =="" {
  3.             ml_p `0'
  4.         }
  5.         marksample touse, novarlist
  6.         if "`pr'" !=""  {
  7.             tempvar xb
  8.             _predict double `xb' , eq(#1)
  9.                 gen `typlist' `varlist' = normal(`xb') if `touse'
 10.                 label var `varlist' "P(y=1|X)"
 11.         }       
 12.         else if "`odds'" !=""  {
 13.             tempvar xb
 14.             _predict double `xb' , eq(#1)
 15.                 gen `typlist' `varlist' = normal(`xb')/normal(-`xb') if `touse'
 16.                 label var `varlist' "P(y=1|X)/P(y=0|X)"
 17.         }       
 18. end

This program,-myprobit_p-, will have two options.

The estimation

All right. With all this information in hand, we can now estimate our probit model.

To keep things reproducible, I will use a dataset from the Stata help file:

. webuse union, clear
(NLS Women 14-24 in 1968)

Now lets estimate the model using -ml-. Here we will indicate the "method" (lf) and the program that defines the Log likelihood.

. ml model lf myprobit (union = age grade not_smsa south##c.year) , maximize

initial:       log likelihood = -18160.456
alternative:   log likelihood = -14355.672
rescale:       log likelihood = -14220.454
Iteration 0:   log likelihood = -14220.454  
Iteration 1:   log likelihood = -13547.574  
Iteration 2:   log likelihood = -13544.385  
Iteration 3:   log likelihood = -13544.385  

. ml display

                                                Number of obs     =     26,200
                                                Wald chi2(6)      =     618.81
Log likelihood = -13544.385                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0118481   .0029072     4.08   0.000     .0061502     .017546
       grade |   .0267365   .0036689     7.29   0.000     .0195457    .0339273
    not_smsa |  -.1293525   .0202595    -6.38   0.000    -.1690604   -.0896445
     1.south |  -.8281078   .2472219    -3.35   0.001    -1.312654   -.3435619
        year |  -.0080931   .0033469    -2.42   0.016    -.0146529   -.0015333
             |
south#c.year |
          1  |    .005737   .0030917     1.86   0.064    -.0003226    .0117965
             |
       _cons |  -.6542487   .2007777    -3.26   0.001    -1.047766   -.2607316
------------------------------------------------------------------------------

You can compare the results with the standard -probit-.

Next, we need to "add" our predction program to e().

. adde local predict myprobit_p

And thats it. We can now estimate the marginal effects for the probit, assuming our outcome of interest is

. margins, dydx(*) predict(pr)  

Average marginal effects                        Number of obs     =     26,200
Model VCE    : OIM

Expression   : P(y=1|X), predict(pr)
dy/dx w.r.t. : age grade not_smsa 1.south year

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .003442    .000844     4.08   0.000     .0017878    .0050963
       grade |   .0077673   .0010639     7.30   0.000     .0056822    .0098525
    not_smsa |  -.0375788   .0058753    -6.40   0.000    -.0490941   -.0260634
     1.south |  -.1054928   .0050851   -20.75   0.000    -.1154594   -.0955261
        year |  -.0017906   .0009195    -1.95   0.051    -.0035928    .0000115
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) predict(odds)  

Average marginal effects                        Number of obs     =     26,200
Model VCE    : OIM

Expression   : P(y=1|X)/P(y=0|X), predict(odds)
dy/dx w.r.t. : age grade not_smsa 1.south year

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0059585    .001467     4.06   0.000     .0030832    .0088338
       grade |    .013446   .0018647     7.21   0.000     .0097913    .0171007
    not_smsa |  -.0650526   .0102657    -6.34   0.000     -.085173   -.0449322
     1.south |  -.1720679   .0084276   -20.42   0.000    -.1885856   -.1555501
        year |  -.0032757    .001598    -2.05   0.040    -.0064076   -.0001438
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins south,  predict(odds)  

Predictive margins                              Number of obs     =     26,200
Model VCE    : OIM

Expression   : P(y=1|X)/P(y=0|X), predict(odds)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       south |
          0  |   .3626616   .0066416    54.60   0.000     .3496442     .375679
          1  |   .1905937   .0051289    37.16   0.000     .1805412    .2006462
------------------------------------------------------------------------------

So how do we interpret the results?. Here one possibility:

People Living in the sourth are, in average, 10% less likely to belong to a union.

or, Living in the south reduces the odds of belonging to a union in 0.172 points (from 0.3627 to 0.1906).

Thanks for reading.